Picture for Bowen Zhang

Bowen Zhang

Audio ControlNet for Fine-Grained Audio Generation and Editing

Add code
Feb 04, 2026
Viaarxiv icon

ERNIE 5.0 Technical Report

Add code
Feb 04, 2026
Viaarxiv icon

HY3D-Bench: Generation of 3D Assets

Add code
Feb 03, 2026
Viaarxiv icon

ToPT: Task-Oriented Prompt Tuning for Urban Region Representation Learning

Add code
Feb 02, 2026
Viaarxiv icon

Improving Day-Ahead Grid Carbon Intensity Forecasting by Joint Modeling of Local-Temporal and Cross-Variable Dependencies Across Different Frequencies

Add code
Jan 10, 2026
Viaarxiv icon

CLARITY: Contextual Linguistic Adaptation and Accent Retrieval for Dual-Bias Mitigation in Text-to-Speech Generation

Add code
Nov 14, 2025
Viaarxiv icon

MoETTA: Test-Time Adaptation Under Mixed Distribution Shifts with MoE-LayerNorm

Add code
Nov 14, 2025
Viaarxiv icon

Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis

Add code
Sep 26, 2025
Viaarxiv icon

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer

Add code
Sep 19, 2025
Figure 1 for MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Figure 2 for MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Figure 3 for MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Figure 4 for MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Viaarxiv icon

rStar2-Agent: Agentic Reasoning Technical Report

Add code
Aug 28, 2025
Viaarxiv icon